Cpu memory graph break #3886

cehongwang · 2025-11-04T20:05:10Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:23.825034+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:55.253944+00:00
@@ -876,15 +876,14 @@
    # This is done to release CPU memory.
    for attr in dir(gm):
        if attr.startswith("_frozen_param"):
            delattr(gm, attr)

-
-
    from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS
+
    DYNAMO_CONVERTERS.disallowed_targets = set()
-    
+
    for name, _ in partitioned_module.named_children():
        submodule = getattr(partitioned_module, name)
        # filter on the GraphModule
        if not isinstance(submodule, torch.fx.graph_module.GraphModule):
            continue

narendasan

Do you have a test case or something to demonstrate this feature?

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py

narendasan · 2025-11-06T20:02:59Z

We should think about using this tech for refit vs non refit
Make refit apis work across graph breaks

narendasan · 2025-11-06T20:04:57Z

Improve usability by automating nn.Module -> atomic fx graph

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

py/torch_tensorrt/dynamo/partitioning/fusion_patterns.py

narendasan · 2025-11-07T21:19:54Z

py/torch_tensorrt/dynamo/_defaults.py

 L2_LIMIT_FOR_TILING = -1
 USE_DISTRIBUTED_MODE_TRACE = False
 OFFLOAD_MODULE_TO_CPU = False
+CPU_MEMORY_BUDGET = -1


Use an optional instead since this is not a TRT api we dont need -1 to mean let us decide

py/torch_tensorrt/dynamo/utils.py

narendasan · 2025-11-07T21:20:58Z

tests/py/dynamo/models/test_models.py

    torch._dynamo.reset()


+def compile_one(idx: int, ir: str):


Why is this test here?

narendasan · 2025-11-07T21:23:11Z

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py

+
+    def size_of_subgraphs(self, subgraphs: List[Subgraph]) -> List[int]:
+        """
+        This function calculates the size of the subgraph.


Can you describe the algorithms here so we have reference for later?

narendasan

Its looking good, just add a quick example in the examples folder and list it under contributor documentation for now

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

py/torch_tensorrt/dynamo/_compiler.py

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

py/torch_tensorrt/dynamo/_compiler.py

py/torch_tensorrt/dynamo/_settings.py

examples/dynamo/low_cpu_memory_compilation.py

py/torch_tensorrt/dynamo/partitioning/_atomic_subgraphs.py

meta-cla bot added the cla signed label Nov 4, 2025

github-actions bot requested a review from peri044 November 4, 2025 20:05

github-actions bot requested changes Nov 4, 2025

View reviewed changes

narendasan reviewed Nov 4, 2025

View reviewed changes

cehongwang force-pushed the cpu-memory-graph-break branch from 7f0e504 to 18ccadf Compare November 5, 2025 22:03

cehongwang force-pushed the cpu-memory-graph-break branch from 18ccadf to f03ab2c Compare November 6, 2025 20:06